Sorrento: A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications

نویسندگان

  • Hong Tang
  • Aziz Gulbeden
  • Jingyu Zhou
  • Lingkun Chu
  • Tao Yang
چکیده

This paper describes the design and implementation of Sorrento – a self-organizing storage cluster built upon commodity components. Sorrento complements previous researches on distributed file/storage systems by focusing on incremental expandability and manageability of the system and on design choices for optimizing performance of parallel data-intensive applications with low write-sharing patterns. Sorrento virtualizes distributed storage devices as incrementally expandable volumes and automatically manages storage node additions and failures. Its consistency model chooses a version-based scheme for data updating and replica management, which is especially suitable for data-intensive applications where distributed processes access disjoint datasets most of the time. To further facilitate parallel I/O, Sorrento provides load-aware or localitydriven data placement and an adaptive migration strategy. This paper presents experimental results to demonstrate features and performance of Sorrento using both microbenchmarks and trace-replay of real applications from several domains, including scientific computing, data mining, and offline processing for web search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey: Load Balancing for Distributed File System

Distributed Systems are useful for computation and storage of large scale data at dispersed location. Distributed File System (DFS) is a subsystem of Distributed System. DFS is a means of sharing of storage space and data. Servers, Storage devices and Clients are on dispersed location in DFS. Fault tolerance and Scalability are two main features of distributed file system. Performance of DFS is...

متن کامل

High-Performance Storage Support for Scientific Big Data Applications on the Cloud

This work studies the storage subsystem for scientific big data applications to be running on the cloud. Although cloud computing has become one of the most popular paradigms for executing data-intensive applications, the storage subsystem has not been optimized for scientific applications. In particular, many scientific applications were originally developed assuming a tightly-coupled cluster ...

متن کامل

Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments

Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...

متن کامل

Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data-intensive, and the ease of deployment of algorithms is an important factor in developing advanced applica...

متن کامل

Runtime Data Declustering over SAN-Connected PC Cluster System

Recently, personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for massively parallel processors, in addition to the conventional scientific calculation. Thus, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003